Search CORE

1,755 research outputs found

Treebanks gone bad: generating a treebank of ungrammatical English

Author: Foster Jennifer
Publication venue
Publication date: 01/01/2007
Field of study

This paper describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses

DCU Online Research Access Service

"cba to check the spelling" investigating parser performance on discussion forum posts

Author: Foster Jennifer
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

We evaluate the Berkeley parser on text from an online discussion forum. We evaluate the parser output with and without gold tokens and spellings (using Sparseval and Parseval), and we compile a list of problematic phenomena for this domain. The Parseval f-score for a small development set is 77.56. This increases to 80.27 when we apply a set of simple transformations to the input sentences and to the Wall Street Journal (WSJ) training sections

CiteSeerX

Irish Universities

DCU Online Research Access Service

The Impact of International and NESB Students on Measured Learning and Standards in Australian Higher Education

Author: Jennifer Foster
Publication venue
Publication date
Field of study

Do international students and/or students from non-English language speaking backgrounds (NESB students) perform worse than other students in Australian undergraduate classrooms? What happens to other students' marks when these students are added to classrooms? I provide new empirical evidence on these questions using very recent administrative panel data from the business faculties of two Australian Technology Network universities. Results show that both international students and NESB students perform significantly worse than other students, even controlling for selection into courses. Both effects are large and do not disappear after the first semester, but non-English speaking background predicts substantially more of a reduction in marks than international student status. Adding international NESB students to a tutorial leads to a reduction in the marks of English-speaking students in that tutorial, whereas the marks of all students benefit from the addition of domestic NESB students to tutorials.Finally, evidence of an upward buoying effect on marks is found from adding international NESB students to courses, which is likely due to the presence of grading on a curve at the course level, but this effect is only felt by international NESB students themselves. Logic suggests that this rise is unlikely to be due to a true learning effect, implying that on average, international NESB students' already low marks are inflated in courses with large fractions of such students.higher education; Australia; peer effects; international students; NESB

Research Papers in Economics

Similarity rules! Exploring methods for ad-hoc rule detection

Author: Dickinson Markus
Foster Jennifer
Publication venue
Publication date: 01/11/2008
Field of study

We examine the role of similarity in ad hoc rule detection and show how previous methods can be made more corpus independent and more generally applicable. Specifically, we show that the similarity of a rule to others in the grammar is a crucial factor in determining the reliability of a rule, providing information unavailable in frequency. We also include a way to score rules which are not in the training data, thereby providing a platform for grammar generalization

CiteSeerX

Irish Universities

DCU Online Research Access Service

Utrecht University Repository

GenERRate: generating errors for use in grammatical error detection

Author: Andersen Øistein E.
Foster Jennifer
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2009
Field of study

This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe initial attempts to replicate Cambridge Learner Corpus errors using GenERRate

CiteSeerX

Irish Universities

DCU Online Research Access Service

Tobit or OLS? An Empirical Evaluation Under Different Diary Window Lengths

Author: Charlene Kalenkoski
Jennifer Foster
Publication venue
Publication date
Field of study

Time use researchers frequently debate whether it is more appropriate to fit censored regression (Tobit) models using maximum likelihood estimation or linear models using ordinary least squares (OLS) to explain individuals’ allocations of time to different activities as recorded in time-diary data. One side argues that estimation of Tobit models addresses the significant censoring (i.e., large numbers of zeros) typically found in time-diary data and that OLS estimation leads to biased and inconsistent estimates. The opposing side argues that optimization occurs over a longer period than that covered by the typical time diary, and thus that reported zeros represent measurement error rather than true non-participation in the activity, in which case OLS is preferred. We use the Australian Time Use Surveys, which record information for two consecutive diary days, to estimate censored and linear versions of a parental child care model for both 24-hour and 48-hour windows of observation in order to determine the empirical consequences of estimation technique and diary length. We find a moderate amount of measurement error when we use the 24-hour window compared to the 48-hour window, but a large number of zeros in the shorter window remain zeroes when we double the window length. Most of the qualitative conclusions we draw are similar for the two windows of observation and the two estimation methods, although there are some slight differences in the magnitudes and statistical significance of the estimates. Importantly, Tobit estimates appear to be more sensitive to window length than OLS estimates.Tobit; OLS; time-diary data

Research Papers in Economics

The Multitasking of Household Production

Author: Charlene Kalenkoski
Jennifer Foster
Publication venue
Publication date
Field of study

The standard household production model does not incorporate multitasking, although time-diary data reveal that individuals regularly multi-task. We formulate a model where time spent in child care can be sole-tasked or multitasked with other household production activities. This model implies associations between household productivity factors and both child outcomes and parental time use. We then use data from the Longitudinal Study of Australian Children and the Australian Time Use Surveys to examine the empirical validity of these implications. Consistent with our model's predictions, household productivity factors are associated both with child outcomes and parental time use.

Research Papers in Economics

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

Author: Barry James
Foster Jennifer
Wagner Joachim
Publication venue
Publication date: 01/01/2020
Field of study

A recent advance in monolingual dependency parsing is the idea of a treebank embedding vector, which allows all treebanks for a particular language to be used as training data while at the same time allowing the model to prefer training data from one treebank over others and to select the preferred treebank at test time. We build on this idea by 1) introducing a method to predict a treebank vector for sentences that do not come from a treebank used in training, and 2) exploring what happens when we move away from predefined treebank embedding vectors during test time and instead devise tailored interpolations. We show that 1) there are interpolated vectors that are superior to the predefined ones, and 2) treebank vectors can be predicted with sufficient accuracy, for nine out of ten test languages, to match the performance of an oracle approach that knows the most suitable predefined treebank embedding for the test set.Comment: Camera ready for ACL 202

arXiv.org e-Print Archive

Crossref

Irish Universities

DCU Online Research Access Service

Using parse features for preposition selection and error detection

Author: Chodorow Martin
Foster Jennifer
Tetreault Joel
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

We evaluate the effect of adding parse features to a leading model of preposition usage. Results show a significant improvement in the preposition selection task on native speaker text and a modest increment in precision and recall in an ESL error detection task. Analysis of the parser output indicates that it is robust enough in the face of noisy non-native writing to extract useful information

CiteSeerX

Irish Universities

DCU Online Research Access Service